Bridging the gap from darkness to solar brilliance

A UN Datathon Story

Janith Wanniarachchi

EBS, Monash

David Wu

EBS, Monash

Sundance Sun

Education, Melbourne

James Hogg

Maths, QUT

Farhan Ameen

Maths & Stats, USyd

February 29, 2024

The official bit

The Datathon

The project brief

  • Create a data solution
  • that tackles local sustainable development challenges
  • and leverages one of the six key transitions
  • which focuses on the SDG localisation enabler.

Our project

Aim: Slap together a half-baked solution in 3 days.

Problem: Globally, nearly a billion people lack reliable energy sources, and solar is a cost-effective way for this demand to be fulfilled.

Solution: Map areas of the globe that solar farm investment would be successful in, by using existing solar farms as training data; overlay that onto a map of energy demand, proxied by night light data.

Data Sources

Quantity Source Provided/Extracted Format
Population density Google Earth Engine, provided by Oak Ridge National Laboratory tiff
Night light intensity NASA, Earth at Night project tiff
Biomass/land use NASA tiff
Terrain slope Google Earth Engine, provided by USGS tiff
Photovoltaic potential Global Solar Atlas tiff
Solar farm locations S. Dunnett, hosted on awesome-gee-community-catalog and figshare csv

Concordance

Data was all remapped from their raw forms onto a consistent grid.

rasterGrid = raster(ncols = 3600, nrows = 1800,
                    xmn = -180, xmx = 180,
                    ymn = -90, ymx = 90)
baseRaster = terra::rast(rasterGrid)

rawValues = terra::rast(tiffFile)
consistentValues = resample(rawValues, baseRaster, method = "bilinear")

valueDataFrame = as.data.frame(consistentValues, xy = TRUE, na.rm = FALSE) %>% 
    mutate(id = 1:ncell(consistenValues))

Rough Model Details: Power Ratio

Regress per-area power production of existing solar farm locations on a laughably small number of factors (photovoltaic potential, land use, terrain slope).

Using “spatial” “random forest”.

library(caret)

form = power_density ~ biomass + slope + photovoltaic_potential + lat + lon

caret::train(
  form,
  method = "ranger",
  ...
)

Rough Model Details: Energy Demand

Demand was modelled using a proxy quantity constructed from night light intensity and population density

import polars as pl

(regressors.lazy()
  .with_columns(
    log_density = (255 - pl.col('density') + 1).log10(),
    log_nightlight = (pl.col('nightlight') + 1).log10(),
  )
  .with_columns(
    demand = -(pl.col('log_density') + pl.col('log_nightlight')) + (np.log10(256))
  )
  .select('x', 'y', 'demand')
)

Shiny App

The experience

Day 1

So none of us had much experience with spatial data.

UN provided “data sources” - but it was just a shotgun list of other lists

Most of it was spent collecting and sourcing data.

Initial focus was on Africa, but we couldn’t find nice shape files or very local data for the region.

Limitations: - We wanted to find spatial data at a lower-than-country resolution over a global span.

  • Work was done on an AWS EC2 instance that had RStudio Server installed.
    • AWS has an image with RSTudio server, but you need to add the AMI, which we coulnd’t do
    • So I chose the wrong instance image (Amazaon Linux) and sepnt a day building dependencies from source.
    • I had to do this again when I upgraded the disk size

Day 2

  • Data was concorded onto a regular grid for analysis.
    • We don’t talk about how long I spent on this in Python that was refactored into a single call in R
  • We used Facebook/Amazon data source initial for population data: gave up cause the data was broken up into tiles, which then didn’t match up after reconstruction.
    • this was
  • Moved to the Google Earth Engine API
    • data exports to drive, but no progress indicator
  • which we had previously used to find terrain data

End of day: - First model - Models were trained overnight

Day 3

R Shiny app was being built

Model was being iterated on Day 3

Submitted at 6.30pm People watched until 9pm Everywhere good for food was closed or in the process of closing

The learnings

Learnings

  • Impress people with fancy graphics.

    • Use AI {.spooky} to generate images
  • Communicating the trash you have assembled is (more) important (than the quality of trash you collect)

  • R and Python can work together

  • Spatial data is a pain in the ass to work with

Acknowledgements

  • QUT Centre for Data Science

  • ADSN

  • George

  • Farhan

  • Sundance

  • Jamie

  • Tim

  • Michael